AITopics | random policy

The generalization ability of the agent is measured by making the set of backgrounds in the R2 seen task and the set of backgrounds in the R2 unseen task mutually exclusive.

artificial intelligence, machine learning, manipulation task, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Using Bounding Boxes

Neural Information Processing SystemsOct-2-2025, 07:30:21 GMT

We agree that it is a nice idea to exploit the bounding boxes of ImageNet, and are happy to explore it in GFNet. We will add more comparisons on this point in our revision. The iterative process is indeed not indispensable, but in experiments, it improves the accuracy (e.g., We will make these clear in revision. We will release all the code and pre-trained models upon the acceptance of this paper. Table 1: Results using 32x32 (left) and 64x64 (right) patches.

artificial intelligence, gfnet, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Graphics (0.65)
Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

Supplementary Materials A Algorithm details

Neural Information Processing SystemsAug-17-2025, 01:02:14 GMT

Our innovation of optimizing interval times is highlighted in blue in Algorithm 1 . A key assumption of Algorithm 1 is that acting often using short time intervals will not hurt performance, and that maximal interaction (i.e. In many scenarios, this assumption seems reasonable and applying Algorithm 1 may work well. For example, some Atari games require frameskipping, i.e., repeating actions for Algorithm 2 assumes that the dynamics can be fully covered by random policies. However, these may be far away from the optimal policy.

artificial intelligence, initialize, machine learning, (14 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

d85b63ef0ccb114d0a3bb7b7d808028f-AuthorFeedback.pdf

Neural Information Processing SystemsAug-16-2025, 17:00:36 GMT

grid search, magnitude, random policy, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.52)
Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

SAD: State-Action Distillation for In-Context Reinforcement Learning under Random Policies

Chen, Weiqin, Paternain, Santiago

arXiv.org Artificial IntelligenceOct-25-2024

Pretrained foundation models (FMs) have exhibited extraordinary in-context learning performance, allowing zero-shot (or few-shot) generalization to new environments/tasks not encountered during the pretraining. In the case of reinforcement learning (RL), in-context RL (ICRL) emerges when pretraining FMs on decision-making problems in an autoregressivesupervised manner. Nevertheless, the current state-of-the-art ICRL algorithms, such as Algorithm Distillation, Decision Pretrained Transformer and Decision Importance Transformer, impose stringent requirements on the pretraining dataset concerning the behavior (source) policies, context information, and action labels, etc. Notably, these algorithms either demand optimal policies or require varying degrees of well-trained behavior policies for all pretraining environments. This significantly hinders the application of ICRL to realworld scenarios, where acquiring optimal or well-trained policies for a substantial volume of real-world training environments can be prohibitively expensive or even intractable. To overcome this challenge, we introduce a novel approach, termed State-Action Distillation (SAD), that allows to generate an effective pretraining dataset guided solely by random policies. In particular, SAD selects query states and corresponding action labels by distilling the outstanding state-action pairs from the entire state and action spaces by using random policies within a trust horizon, and then inherits the classical autoregressive-supervised mechanism during the pretraining. To the best of our knowledge, this is the first work that enables effective ICRL under (e.g., uniform) random policies and random contexts. We also establish the quantitative analysis of the trustworthiness as well as the performance guarantees of our SAD approach. Moreover, our empirical results across multiple popular ICRL benchmark environments demonstrate that, on average, SAD outperforms the best baseline by 236.3% in the offline evaluation and by 135.2% in the online evaluation.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2410.19982

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.84)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: Recurrent World Models Facilitate Policy Evolution

Neural Information Processing SystemsOct-7-2024, 07:24:23 GMT

Summary: This paper proposes a new way to develop a world model for reinforcement learning. The focus is on the encoding of the visual world, coupled with a world model that learns based on the compressed representation. The world model is a recurrent version of Bishop's (1995, neural networks book, chapter 6) mixture of gaussians network. That network outputs the weights of an MOG (using softmax), the means of the gaussians (linear outputs), and the variance (modeled as e var, so it is a scale parameter). I had not seen a recurrent version of this network before.

recurrent version, world model, world model facilitate policy evolution, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Filters

Collaborating Authors

random policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

165a59f7cf3b5c4396ba65953d679f17-Supplemental.pdf

e562cd9c0768d5464b64cf61da7fc6bb-Supplemental.pdf

d85b63ef0ccb114d0a3bb7b7d808028f-AuthorFeedback.pdf

1963bd5135521d623f6c29e6b1174975-AuthorFeedback.pdf

APPENDIX

Using Bounding Boxes

Supplementary Materials A Algorithm details

d85b63ef0ccb114d0a3bb7b7d808028f-AuthorFeedback.pdf

SAD: State-Action Distillation for In-Context Reinforcement Learning under Random Policies

Reviews: Recurrent World Models Facilitate Policy Evolution